A rule-based approach to farsi language text-to-phoneme conversion
نویسندگان
چکیده
A conversion from orthographic (written) form to a phonetic transcription is the first stage in a text-to-speech system. In this study, algorithms are presented to facilitate the text-to-phoneme (TTP) conversion for the Farsi language. Using a lexicon of about 15000 base morphemes, word formation rules are investigated and implemented. Moreover, a word segmentation of the written sentence has to be done prior to any phonetic transcription of the text. Due to special form of Farsi orthography, the word segmentation process is a complicated one. To solve the problem, a fast and on-line algorithm and a more complicated off-line algorithm are presented. The overall performance of the TTP conversion is evaluated to be more than
منابع مشابه
Grapheme to phoneme conversion: an Arabic dialect case
We aim to develop a Speech-to-Speech translation system between Modern Standard Arabic and Algiers dialect. Such a system must include a Text-to-Speech module which itself must include a Grapheme-to-Phoneme converter. Algiers dialect is an Arabic dialect concerned by the most problems of Modern Standard Arabic in NLP area. Furthermore, it could be considered as an under-resourced language becau...
متن کاملRule-based Korean Grapheme to Phoneme Conversion Using Sound Patterns
Grapheme-to-phoneme conversion plays an important role in text-to-speech applications and other fields of computational linguistics. Although Korean uses a phonemic writing system, it must have a grapheme-to-phoneme conversion for speech synthesis because Korean writing system does not always reflect its actual pronunciations. This paper describes a grapheme-to-phoneme conversion method based o...
متن کاملGrapheme-to-Phoneme conversion, a knowledge-based approach
This paper reflects the results of an ongoing project at Högskolan i Skövde, aimed at the creation of a system for grapheme-to-phoneme conversion for Swedish, from a knowledge-based approach. The focus lies on development and implementation of an algorithm for parsing ortographic text, and phonetic rules for the transcription.
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملA Semantic Approach to Person Profile Extraction from Farsi Documents
Entity profiling (EP) as an important task of Web mining and information extraction (IE) is the process of extracting entities in question and their related information from given text resources. From computational viewpoint, the Farsi language is one of the less-studied and less-resourced languages, and suffers from the lack of high quality language processing tools. This problem emphasizes th...
متن کامل